Content
Entities
- Find a well performing model for NER (cf. https://huggingface.co/flair/ner-english-ontonotes-fast, https://huggingface.co/stanfordnlp/stanza-en)
- Find a way to label the name of a main character (example Miller, prince, etc.) as more than simply a noun
- Pull out subject and predicate from sentence. Explore verbs related to subjects.
Common phrases using:
- TextRank
- collocation/word frequency
Topic modeling
- Define cleaning tasks and stop words to improve topic models performance; right now they are too close together, with a few main clusters of topics that are difficult to distinguish